Census Division No. 10
Foundations of Artificial Intelligence Frameworks: Notion and Limits of AGI
Within the limited scope of this paper, we argue that artificial general intelligence cannot emerge from current neural network paradigms regardless of scale, nor is such an approach healthy for the field at present. Drawing on various notions, discussions, present-day developments and observations, current debates and critiques, experiments, and so on in between philosophy, including the Chinese Room Argument and Gödelian argument, neuroscientific ideas, computer science, the theoretical consideration of artificial intelligence, and learning theory, we address conceptually that neural networks are architecturally insufficient for genuine understanding. They operate as static function approximators of a limited encoding framework - a 'sophisticated sponge' exhibiting complex behaviours without structural richness that constitute intelligence. We critique the theoretical foundations the field relies on and created of recent times; for example, an interesting heuristic as neural scaling law (as an example, arXiv:2001.08361 ) made prominent in a wrong way of interpretation, The Universal Approximation Theorem addresses the wrong level of abstraction and, in parts, partially, the question of current architectures lacking dynamic restructuring capabilities. We propose a framework distinguishing existential facilities (computational substrate) from architectural organization (interpretive structures), and outline principles for what genuine machine intelligence would require, and furthermore, a conceptual method of structuralizing the richer framework on which the principle of neural network system takes hold.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (14 more...)
- Research Report (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.67)
- Education (0.67)
- Government > Regional Government > North America Government > United States Government (0.45)
- Asia > Middle East > Jordan (0.06)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Abramov, Roman, Steinbauer, Felix, Kasneci, Gjergji
Transformers have achieved great success in numerous NLP tasks but continue to exhibit notable gaps in multi-step factual reasoning, especially when real-world knowledge is sparse. Recent advances in grokking have demonstrated that neural networks can transition from memorizing to perfectly generalizing once they detect underlying logical patterns - yet these studies have primarily used small, synthetic tasks. In this paper, for the first time, we extend grokking to real-world factual data and address the challenge of dataset sparsity by augmenting existing knowledge graphs with carefully designed synthetic data to raise the ratio $ϕ_r$ of inferred facts to atomic facts above the threshold required for grokking. Surprisingly, we find that even factually incorrect synthetic data can strengthen emergent reasoning circuits rather than degrade accuracy, as it forces the model to rely on relational structure rather than memorization. When evaluated on multi-hop reasoning benchmarks, our approach achieves up to 95-100% accuracy on 2WikiMultiHopQA - substantially improving over strong baselines and matching or exceeding current state-of-the-art results. We further provide an in-depth analysis of how increasing $ϕ_r$ drives the formation of generalizing circuits inside Transformers. Our findings suggest that grokking-based data augmentation can unlock implicit multi-hop reasoning capabilities, opening the door to more robust and interpretable factual reasoning in large-scale language models.
- Media > Film (0.67)
- Leisure & Entertainment (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.48)
A Markov Random Field Multi-Modal Variational AutoEncoder
Oubari, Fouad, Baha, Mohamed El, Meunier, Raphael, Décatoire, Rodrigue, Mougeot, Mathilde
Recent advancements in multimodal Variational AutoEncoders (VAEs) have highlighted their potential for modeling complex data from multiple modalities. However, many existing approaches use relatively straightforward aggregating schemes that may not fully capture the complex dynamics present between different modalities. This work introduces a novel multimodal VAE that incorporates a Markov Random Field (MRF) into both the prior and posterior distributions. This integration aims to capture complex intermodal interactions more effectively. Unlike previous models, our approach is specifically designed to model and leverage the intricacies of these relationships, enabling a more faithful representation of multimodal data. Our experiments demonstrate that our model performs competitively on the standard PolyMNIST dataset and shows superior performance in managing complex intermodal dependencies in a specially designed synthetic dataset, intended to test intricate relationships.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Alberta > Census Division No. 10 > Vermilion River County (0.04)
- (2 more...)
- Health & Medicine (0.46)
- Banking & Finance (0.46)
LLM-Collaboration on Automatic Science Journalism for the General Audience
Jiang, Gongyao, Shi, Xinran, Luo, Qiong
Science journalism reports current scientific discoveries to non-specialists, aiming to enable public comprehension of the state of the art. However, this task can be challenging as the audience often lacks specific knowledge about the presented research. To address this challenge, we propose a framework that integrates three LLMs mimicking the real-world writing-reading-feedback-revision workflow, with one LLM acting as the journalist, a smaller LLM as the general public reader, and the third LLM as an editor. The journalist's writing is iteratively refined by feedback from the reader and suggestions from the editor. Our experiments demonstrate that by leveraging the collaboration of two 7B and one 1.8B open-source LLMs, we can generate articles that are more accessible than those generated by existing methods, including advanced models such as GPT-4.
- Africa > Uganda (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (5 more...)
- Media > News (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.72)
- Health & Medicine > Therapeutic Area > Immunology (0.50)
SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients
Verma, Tushar, Singh, Jyotsna, Bhartari, Yash, Jarwal, Rishi, Singh, Suraj, Singh, Shubhkarman
Small object detection in aerial imagery presents significant challenges in computer vision due to the minimal data inherent in small-sized objects and their propensity to be obscured by larger objects and background noise. Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases, which adversely affect their performance with objects of varying orientations and scales. This underscores the need for more adaptable, lightweight models. In response, this paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects. Firstly, we explore the use of the SAHI framework on the newly introduced lightweight YOLO v9 architecture, which utilizes Programmable Gradient Information (PGI) to reduce the substantial information loss typically encountered in sequential feature extraction processes. The paper employs the Vision Mamba model, which incorporates position embeddings to facilitate precise location-aware visual understanding, combined with a novel bidirectional State Space Model (SSM) for effective visual context modeling. This State Space Model adeptly harnesses the linear complexity of CNNs and the global receptive field of Transformers, making it particularly effective in remote sensing image classification. Our experimental results demonstrate substantial improvements in detection accuracy and processing efficiency, validating the applicability of these approaches for real-time small object detection across diverse aerial scenarios. This paper also discusses how these methodologies could serve as foundational models for future advancements in aerial object recognition technologies. The source code will be made accessible here.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Alberta > Census Division No. 10 > Vermilion River County (0.04)
- North America > Canada > Alberta > Census Division No. 10 > Two Hills County No. 21 (0.04)
- Asia > China (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)
- Transportation (0.68)
- Leisure & Entertainment > Sports (0.68)
Towards Explainable Clustering: A Constrained Declarative based Approach
Guilbert, Mathieu, Vrain, Christel, Dao, Thi-Bich-Hanh
The domain of explainable AI is of interest in all Machine Learning fields, and it is all the more important in clustering, an unsupervised task whose result must be validated by a domain expert. We aim at finding a clustering that has high quality in terms of classic clustering criteria and that is explainable, and we argue that these two dimensions must be considered when building the clustering. We consider that a good global explanation of a clustering should give the characteristics of each cluster taking into account their abilities to describe its objects (coverage) while distinguishing it from the other clusters (discrimination). Furthermore, we aim at leveraging expert knowledge, at different levels, on the structure of the expected clustering or on its explanations. In our framework an explanation of a cluster is a set of patterns, and we propose a novel interpretable constrained clustering method called ECS for declarative clustering with Explainabilty-driven Cluster Selection that integrates structural or domain expert knowledge expressed by means of constraints. It is based on the notion of coverage and discrimination that are formalized at different levels (cluster / clustering), each allowing for exceptions through parameterized thresholds. Our method relies on four steps: generation of a set of partitions, computation of frequent patterns for each cluster, pruning clusters that violates some constraints, and selection of clusters and associated patterns to build an interpretable clustering. This last step is combinatorial and we have developed a Constraint-Programming (CP) model to solve it. The method can integrate prior knowledge in the form of user constraints, both before or in the CP model.
- Africa (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.88)
Convergence of Some Convex Message Passing Algorithms to a Fixed Point
Voracek, Vaclav, Werner, Tomas
A popular approach to the MAP inference problem in graphical models is to minimize an upper bound obtained from a dual linear programming or Lagrangian relaxation by (block-)coordinate descent. Examples of such algorithms are max-sum diffusion and sequential tree-reweighted message passing. Convergence properties of these methods are currently not fully understood. They have been proved to converge to the set characterized by local consistency of active constraints, with unknown convergence rate; however, it was not clear if the iterates converge at all (to any single point). We prove a stronger result (which was conjectured before but never proved): the iterates converge to a fixed point of the algorithm. Moreover, we show that they achieve precision $\varepsilon>0$ in $\mathcal{O}(1/\varepsilon)$ iterations. We first prove this for a version of coordinate descent applied to a general piecewise-affine convex objective, using a novel proof technique. Then we demonstrate the generality of this approach by reducing some popular coordinate-descent algorithms to this problem. Finally we show that, in contrast to our main result, a similar version of coordinate descent applied to a constrained optimization problem need not converge.
- Asia > Middle East > Jordan (0.05)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > Canada > Alberta > Census Division No. 10 > Vermilion River County (0.04)
- (5 more...)
Improving Automatic VQA Evaluation Using Large Language Models
Mañas, Oscar, Krojer, Benno, Agrawal, Aishwarya
8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the performance of VQA systems. Thus, there is a need to develop more robust automatic VQA metrics that serve as a proxy for human judgment. In this work, we propose to leverage the in-context learning capabilities of instruction-tuned large language models (LLMs) to build a better VQA metric. We formulate VQA evaluation as an answer-rating task where the LLM is instructed to score the accuracy of a candidate answer given a set of reference answers. We demonstrate the proposed metric better correlates with human judgment compared to existing metrics across several VQA models and benchmarks. We hope wide adoption of our metric will contribute to better estimating the research progress on the VQA task. We plan to release the evaluation code and collected human judgments.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Texas > Yoakum County (0.04)
- North America > Canada > Alberta > Census Division No. 10 > Two Hills County No. 21 (0.04)
- North America > Canada > Alberta > Census Division No. 10 > Minburn County No. 27 (0.04)
Image Clustering Conditioned on Text Criteria
Kwon, Sehyun, Park, Jaeseung, Kim, Minkyu, Cho, Jaewoong, Ryu, Ernest K., Lee, Kangwook
Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our method Image Clustering Conditioned on Text Criteria (IC|TC), and it represents a different paradigm of image clustering. IC|TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC|TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, while significantly outperforming baselines.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > Alberta > Census Division No. 10 > Vermilion River County (0.04)
- North America > Canada > Alberta > Census Division No. 10 > Two Hills County No. 21 (0.04)
- (2 more...)
- Media > Music (1.00)
- Leisure & Entertainment > Sports (0.93)
- Transportation > Ground > Road (0.46)